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ABSTRACT: Over the last three decades, there has been a significant evolution in storage tech- 
nologies supporting archival of remote sensing data. This section provides a brief survey of how 
these technologies have evolved. Three main technologies are considered - tape, hard disk and 
solid state disk. Their historical evolution is traced, summarizing how reductions in cost have 
helped being able to store larger volumes of data on faster media. The cost per GB of media is 
only one of the considerations in determining the best approach to archival storage. Active ar- 
chives generally require faster response to user requests for data than permanent archives. The 
archive costs have to consider facilities and other capital costs, operations costs, software li- 
censes, utilities costs, etc. For meeting requirements in any organization, typically a mix of 
technologies is needed. 


1 INTRODUCTION 

Over the last three decades, there has been a significant evolution in storage technologies 
supporting archival of remote sensing data. This section provides a brief survey of how these 
technologies have evolved. First, we consider the distinction between active and long-term or 
permanent archives. 

1 . 1 Active Archives 

We define active archives as facilities that store data that are in active use by the community. 
Typically active archives ingest raw data and/or derived digital products (hereafter simply re- 
ferred to as “data”) from active remote sensing missions, even though serving a user community 
with the stored data may proceed well beyond the life of active missions. Given the active user 
community, it is necessary for active archives to be responsive to the community requirements. 
Typically, relatively fast access is needed. Data needed to be backed up and restored promptly 
while the system continues to operate in a responsive manner. Support staff is needed to assist 
data providers in setting up mechanisms for product generation and delivery to the archive, as 
well as any problems that arise during production and ingest operations. Also, user services are 
needed to help consumers of data with answers to questions they may have about either the me- 
chanics of obtaining the data or scientific questions about the data themselves. When the mis- 
sions are active, expert consultation is generally available to user services staff from scientists 
associated with the mission. It is also the active archive’s responsibility to prepare for perma- 
nent archive of data at the end of the active archive phase, whether the data are transferred to 
another organization or continue to be held at the same organization. 

From a hardware standpoint, an active archive might use a tiered storage mechanism to optimize 
performance. Tiered storage provides access to data across a virtualized storage system (see 


http://en.wikipedia.org/wiki/Active_Archive.) The data migrates between several systems that 
use different types of media for storage. Data that are accessed more frequently and need to be 
provided fast would reside on more expensive media and storage systems, while other types of 
data would be on less expensive hardware. The migration among the systems is handled auto- 
matically. Metadata keep track of where the data are. The data are available on primary, second- 
ary and tertiary systems, providing on-line or near-line accessibility. 

1.2 Permanent Archives 

Permanent archives store data “forever”, long after the data cease to be in active use. Quick ac- 
cess to data may not be an essential requirement. However, it should be possible to obtain the 
data when needed, for example, for retrospective studies that might occur, say, 30 years after the 
active usage ended. The level of service to users may not be as high as it is in active archives. 
Experts directly involved in the missions would no longer be available for consultation. Thus 
one needs to depend on archived documentation, which must be complete to enable a diligent 
user to comprehend how the data had been generated. 

Thus, from a hardware standpoint, a permanent archive may use less expensive and less respon- 
sive storage systems than an active archive. However, in both the active and permanent archives, 
preservation with no loss is equally important. Preservation requires: 

• No loss of bits 

• Discoverability and accessibility 

• Readability 

• Understandability 

• Usability 

• Reproducibility of results 

From the point of view of hardware, the first and third bullets above are significant. The remain- 
ing bullets are also very important for preservation, but the actions to be taken to enable them 
are not within the scope of hardware solutions. Migration to newer media and reader technolo- 
gies is essential to ensure no loss of bits over time and readability of data. Changes in technolo- 
gy over time provide less expensive and faster storage with greater capacity, enabling us to ar- 
chive ever-growing volumes of data. However, they also require frequent (perhaps continuous) 
migrations of data to newer media. 


2 STORAGE TECHNOLOGIES 

There are several recent publications tracking the evolution of storage technologies and the re- 
ductions in costs per unit of archival storage over the last three decades. An interesting history 
of various types of computer devices including storage can be found in Computer History Mu- 
seum (2015). Magnetic tapes. Hard Disk Drives and Solid State Disks/Drives are the major 
technologies that have been used for bulk storage. These will be discussed briefly below. It is to 
be noted that the names of companies and products given below only are meant to be illustrative 
of the technologies and capacities achieved as a part of storage technology evolution. Clearly it 
is beyond the scope of this section to cover all the storage products that have been made availa- 
ble in industry. The interested reader is encouraged to pursue the references provided here for 
more details. 

2. 1 Magnetic Tapes 

Magnetic tape storage technology, first patented by a German engineer, Fritz Pfieumer in 1928 
(Zetta, Inc. 2015), has been evolving over time and is still in use for bulk storage applications 
due to its low cost, portability and unlimited off-line capacity. Magnetic tapes were used for au- 
dio recording in the 1930s and were first used for data storage by UNIVAC in 1951. In circa 
1970, IBM introduced the 10.5” standard tape reels. This standard lasted for over 25 years with 
various lengths (1,200 feet, 2,400 feet and 3,600 feet), numbers of tracks (7 and 9), and record- 
ing densities (ranging from 200 characters per inch to 6,250 characters per inch). The Digital 


Equipment Coporation’s (DEC) CompacTape Cartridge replaeed the 1960s tape technology and 
was later standardized as Digital Linear Tape (DLT). The DLT technology evolved from 92MB 
capacities in 1984 to 800 GB capacities in 2006 (superDLT Format). Cartridges and cassettes, 
consisting of tape reels what are completely enclosed in a plastic casing have come into com- 
mon use since audio compact cassettes were used in home computers as inexpensive storage in 
the 1970s and 1980s. As of 2014, various cartridge formats are in use - Digital Data Storage 
(DDS), a format for storing computer data on a Digital Audio Tape (DAT), Digital Linear Tape 
(DLT), Linear Tape - Open (LTO). Steady increases in cartridge capacities are exemplified by 
the evolution of generations of LTO. LTO-1, in the year 2000, had a capacity of 100GB, while 
LTO-6 in 2012 had a capacity of 2.5 TB. It is anticipated that LTO-10 will have a capacity of 48 
TB. 

Magnetic tapes can be stored off-line or in “near-line” tape libraries. With off-line storage, a 
human operator needs to mount a tape on a tape drive in order to read or write data. Near-line 
tape libraries include a robotic device that is controlled to access and mount the tape of interest 
on a tape drive to permit reading and writing. The IBM 3850 mass storage system, announced in 
1974, was one of the earliest examples, consisting of a number of cylindrical cartridges held in a 
hexagonal array of bins. Data were transferred automatically between higher-speed disk drives 
(on-line storage) and the cartridges. The capacity of the mass storage systems ranged from 35.3 
GB to 472 GB, depending on the model. This series was discontinued in 1986. Over the past 
three decades, use of near-line libraries has become common. In late 1990s through mid -2000s, 
the NASA Earth Observing System (EOS) Data and Information System (EOSDIS) used robotic 
tape silos for near-line storage of most (several petabytes) of the EOS data and derived digital 
products in its Distributed Active Archive Centers (DAACs). The access to data from near-line 
storage can be significantly slower than that from on-line spinning disks. (Today, the DAACs 
use on-line spinning discs for most of the archive storage while using near-line capacity for 
back-up.) Of course, the access with near-line robotic tape silos would typically be much faster 
and less subject to human errors than from off-line storage requiring tape mounts by operators. 
As of 2014, there are near-line mass storage systems with multi -exabyte capacities (Oracle, 
2015). 

2.2 Hard Disk Drives (HDD) 

A history of the evolution of hard disk storage and a detailed time line from 1956 to 2014 can be 
seen in Wikipedia Contributors (2014). The following is a short summary of highlights from 
that article. The first commercial hard drives were introduced by IBM in 1 956 with a capacity of 

5 million 6-bit characters. In 1965, the IBM 2341 was introduced with removable disk packs of 
1 1 disks for a total capacity of 29 MB. In 1975, the IBM 3350 "Madrid" was brought into mar- 
ket, re-introducing disk drives with fixed disks. The capacity of Madrid was 317.5 Megabytes 
per drive, for a capcity of over 2 GB per string consisting of eight 14" disks. In 1980, Seagate 
Technology (then Shugart Technology) introduced the ST-506, the first 5.25 inch hard disk 
drive, which had a capacity of 5 MB. Also in 1980, the IBM 3380 came on the market. It was 
the world's first gigabyte-capacity disk drive (2.52 GB), was the size of a refrigerator, and 
weighed 249 kg. In 1988, PrarieTek 220 was introduced as the first 2.5 inch hard drive, which 
had a capacity of 20 MB and suitable for portable computers. In 1997, IBM introduced the 
Deskstar 16 GP “Titan” with five 3.5 inch disks with a capacity of 16.8 GB. This was significant 
in that it was the first commercial use of Giant Magnetoresistance heads. Also in 1997, Seagate 
brought into market the Medalist Pro 9140 (ST39140A) with a 9.1 GB capacity, the first hard 
drive with fluid bearings. There were several key developments in 2005. The first 500 GB hard 
drive was shipped by Hitachi GST (HGST), Serial ATA 3 Gb/sec was standardized, Seagate in- 
troduced Tunnel Magneto-Resistive Read Sensor and Thermal Spacing Control, faster Serial At- 
tached SCSI was introduced, and Toshiba shipped the first perpendicular magnetic recording 
hard disk drive (1.8 inch, 40/80 GB). The years 2007 through 2011 saw a few firsts in the capac- 
ities of hard disks, starting from 1 TB (2007, HGST) to 4 TB (2011, Seagate). In 2013, HGST 
announced a helium-filled 6 TB hard disk drive for enterprise applications. In 2014, Seagate 
shipped the first 8 TB hard drives. 

The costs of hard disk drives from 1980 to present are summarized by M. Komorowski (2009 

6 2014). Figure 1 is adopted from his articles. Note the logarithmic scale in the figure. It shows 


that the cost per gigabyte has dropped from $700K in 1981 to between $0.03 and $0.06 in 2014. 
Komorowski shows a regression model indicating doubling of storage capacity per unit cost 
every 14 months. Other examples of such cost trends are compiled by Smith (2014) and 
McCallum (2014). 
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Figure 1. Hard Drive Costs per Gigabyte - 1980 to 2014. (Credit M. Komorowski). 


2.3 Solid-State Drives or Solid-State Disks (SSD) 

Devices called Solid-State Drives or Solid-State Disks (SSD) are neither drives nor disks. They 
are storage devices like HDDs, but use integrated circuit assembles for persistent storage of da- 
ta. They also use electronic interfaces that are compatible with Hard Disk Drives (HDDs), but 
provide significantly higher input/output performance. SSDs differ from HDDs, floppy disks or 
tape drives in that they do not have any moving components and thus are resistant to physical 
shocks and run silently. Most SSDs use NAND-based flash memory, which can retain data 
without constant need for electric power. For faster access. Random Access Memory (RAM) 
SSDs can be used, but they require some source of electric power to retain data. (See Wikipedia 
Contributors, 20 1 5b) . 

A time-line of the evolution of SSD’s from 1976 to 2014 is given by Zsolt Kerekes (2015). 
The following is a brief summary of its highlights. In 1976, Dataram brought into market an 
SSD called BULK CORE. It emulated hard disks and had a capacity of 2 MB. In 1978, 1 GB of 
RAM SSD would have cost $1M. Texas Memory Systems introduced a 16 KB RAM SSD for 
accelerating field seismic data acquisition for oil companies. In 1982, SemiDisk Systems 
shipped SSD accelerators for the Personal Computer market, initially with a capacity of 512ICB 
and later with a capacity of 2 MB. In 1990, NEC introduced 5.25” SCSI SSDs that used RAM 
technology and backed up with internal batteries. By 1996, with ATTO Technology’s introduc- 
tion of SiliconDisk II, the RAM SSD capacities had gone up to 1 .6 GB with a throughput of 80 
MB/s and 22,000 input/output operations per second (lOPS). In 1999, BiTMICRO introduced a 
flash SSD with a capacity of 18GB. By the end of 1999, there were at least 1 1 manufacturers of 
SSDs. In November 2000, BiTMICROS launched the first hot-swappable 3.5” SCSI SSD. In 
2001, Winchester Systems introduced FlashSSD as an option in its OpenRAID Storage Area 
Network products for use on a small percentage of “hot files” that account for a majority of disk 
access requests. FlashSSD provided consistent performance of 12K lOPS and 40 MB/s through- 
put. Also in 2001, Texas Memory Systems began promoting its RamSan-210, a RAM SSD with 
32GB capacity, lOOK lOPS and 20 microsecond access times. In 2003, SSDs with a capacity of 
1 TB became commercially available. In 2005, M-Systems announced that the industry's highest 
capacity 2.5" Serial Advance technology Attachment (SATA) SSD with 128 GB storage capaci- 


ty was available. Also, Texas Memory Systems launched SSDs with a 4Gb/s Fiber Channel in- 
terface offering up to 128-gigabytes capacity and 500,000 random lOPS performance. In 2006, 
1.8" 32GB flash SSDs from Samsung hit the market. In 2008, the number of Original Equip- 
ment Manufacturers (OEMs) of SSDs reached 100. In 2010, Texas Memory Systems announced 
the availability of the RamSan-630 SSD with 4 to 10TB capacity, 500,000 lOPS, and 8GB/s 
bandwidth. Foremay announced its 2TB 3.5" and 1TB 2.5" SATA flash SSDs with read/write 
speeds of up to 200MB/s. Fusion-io set speed records of achieving 1 million lOPS and 6.2 GB/s 
bandwidth, and offered capacity up to 5.7 TB. In late 2011, BiTMICRO announced a new gen- 
eration of enterprise SSD controllers that could deliver up to 400K lOPS and a capacity of 5TB 
for availability in the first half of 2012. In 2012, HOST demonstrated the first 12Gb/s Serial At- 
tached SCSI SSD in industry. In 2013, Micron announced a new model of hot swappable 2.5” 
Peripheral Component Interconnect Express (PCle) SSDs with up to 1.4TB multi-level cell 
(MEC) capacity, which could deliver 750K lOPS. Samsung entered the entry into the 2.5” PCIe 
SSD market with its NVMe SSD which had up to 1.6 TB capacity, read throughput of 3 GB/s 
and up to 740K lOPS. IN 2014, Samsung provided a comparison of speeds of 2.5” SSDs using 
SAS and PCle technologies and showed that its PCle SSDs were 3 times faster than the SAS 
SSDs. SanDisk started sampling 2.5” 4 TB SAS SSDs. Skyera launched its 136 TB, lU (i.e., 
1.75” high) rack-mounted SSD called the SkyHawk FS. 

A comparison of average prices per gigabyte of HDD and SSD over the period 1996 through 
2012 is shown in Figure 2 (from Royal Pingdom 2011). The cost of hard disks and drives has 
dropped significantly over the past three decades, making their use feasible for petabyte scale 
data archives. Where high throughput performance is a requirement. Solid State Disks (SSDs) 
are being used in recent years. SSDs are significantly more expensive than HDD’s as shown in 
Figure 2. However, the cost differential between HDD and SSD has been dropping significantly 
over the years. Cost of SSD per GB was 120 times that of HDD in 2007, while in 2011 the same 
ratio was 32. In 2014, this ratio had dropped to about 25. Vendor advertisements in January 
2015 showed a ratio of 8 to 10. It is difficult to predict whether the two costs will become com- 
parable in the future. See Baxter (2014) for a comparison of HDD and SSD and a discussion of 
pros and cons.). 



Figure 2. Comparison of Cost of Hard Disk Drives and Solid State Drives (Credit: Royal Pingdom) 



3 CASE STUDY - NASA’S EOSDIS 


NASA’s Earth Observing System Data and Information System (EOSDIS) is a large system 
with 12 Distributed Active Archive Centers across the United States. EOSDIS manages most of 
NASA’s Earth science data from satellite missions, aircraft investigations, field campaigns and 
other sources. At the end of 2014, EOSDIS archived approximately 10 PB of data. The EOSDIS 
Core System (ECS) provides common “core” hardware and software capabilities to three 
DAACs - the Atmospheric Science Data Center at NASA Eangley Research Center, Hampton, 
Virginia, the Eand Processes DAAC at USGS EROS in Sioux Falls, South Dakota, and the Na- 
tional Snow and lee Data Center at the University of Colorado in Boulder, Colorado. For pur- 
poses of illustration, the storage technologies employed in the ECS archives are discussed be- 
low. The ECS has been in operation since late 1999 and has been supporting archiving and 
distribution of the EOS satellite missions. It has evolved from near-line tape-based robotic ar- 
chives to on-line disk-based archives. Behnke et al (2005) describe the technology changes since 
the beginning of the ECS design in 1995 through 2005. Initially, all of the data were stored in 
robotic tape silos, with on-line disk storage being used to cache the data. As the cost of disk 
storage decreased, it became feasible to provide some of the data on-line. The concept of “data 
pools” was introduced in 2001 (Moore and Eowe, 2002). Data Pools were large (tens of TB) 
caches of popular datasets that could be directly downloaded by users, thus reducing the latency 
in meeting user requests. With further reductions in disk costs, most of the data are now held on- 
line. This also helps in providing other on-line services to users such as subsetting, reprojection, 
visualization, etc. upon request. Regarding the utilization of disk and tape technologies for back- 
ups of archives at all the EOSDIS DAACs, the following observations can be made. There is an 
equal mix of tape and disk based on product count. Disk is a popular medium for smaller vol- 
ume products and tape for larger volume products. Transition from tape to disk based backup 
has been driven by reduced disk costs; improved restore input/output speeds from disk; and 
lower error rates in stored disks. A small number of products are backed up on CD/DVD/Blu 
Ray (optical media). The DAACs have automated systems to manage ingest, archiving and 
back-up of data. In particular, the back-up system used at the three DAACs mentioned above 
that have the EOSDIS Core System is a tiered storage management system using StorNext. This 
provides seamless access to data held on disk and tape media. The data are ingested on to a set 
of archive disks. They are then copied from the archive disk to tape for a complete back-up. 
Copies to tape are determined by a set of configurable policies and generally occur after a set 
period of time, a data volume threshold is reached for specified datasets, or when an archive 
disk reaches a capacity threshold. 


4 CONCEUSION 

This section provides a discussion of three main technologies for archival storage and traces 
their historical evolution, summarizing how reductions in cost have helped being able to store 
larger volumes of data on faster media. The cost per GB of media is only one of the considera- 
tions in determining the best approach to archival storage. Active archives generally require 
faster response to user requests for data than permanent archives. The archive costs have to con- 
sider facilities and other capital costs, operations costs, software licenses, utilities costs, etc. An 
example of such an analysis by the San Diego Supercomputer Center is given in Moore et al. 
(2014) They demonstrate that the annual operating cost per TB of storage at their facility is a 
factor of three less for tape than for disk storage. However, for meeting requirements in any or- 
ganization, typically a mix of technologies is needed. There has been a very significant change 
over the past 30 years in the capabilities that active archives can provide for their users. One 
could not have imagined 30 years ago that scientists using remote sensing data would today 
have most of the data available to them on-line and be able to “work from anywhere”. Several 
technological advances have contributed to this change, including evolution of archival storage 
discussed in this section as well as inexpensive storage available for users’ laptop or desktop 
computers and faster performance of networks. While it is difficult to predict the technological 
environment of 30 years into the future, it can be expected that SSD’s will become sufficiently 


inexpensive to support major archival operations and help with much faster access to data from 
users all over the world. 
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